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Abstract. This short paper summarizes the development of ColloCaid (www. 
collocaid.uk), a text editor that supports writers with academic English collocations. 
After a brief introduction, the paper summarizes how the lexicographic database 
underlying ColloCaid was compiled, how text editor integration was achieved, 
and results from initial user studies. The paper concludes by outlining future 
developments. 
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1. Introduction 


Research has shown that less experienced users of academic English have a limited 
repertoire of collocations (Frankenberg-Garcia, 2018). Indeed, collocations like 
REACH+ conclusion are among the most frequent look-ups among novice users of 
written academic English (Yoon, 2016). 


There are a number of tools and resources that academic writers can use to 
search for such idiomatic combinations of words. These include general English 
dictionaries and more targeted ones like the Longman Collocations Dictionary and 
Thesaurus (Mayor, 2013) or the Oxford Learner s Dictionary of Academic English 
(Lea, 2014). Writers familiar with corpora can also consult general English corpora 
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like the BNC and COCA, and corpora of student papers like BAWE (Nesi, 2011) 
and MICUSP (Romer & Swales, 2010). Other useful tools include SKELL (Baisa 
& Suchomel, 2014), arguably the easiest to use English corpus available, FlaxLC 
(Wu, Fitzgerald, Yu, & Witten, 2019), a learner-friendly corpus-based collocation 
tool, and LEAD (Granger & Paquot, 2015), an academic English dictionary-cum- 
corpus. 


However, writers may not know where or how to look up collocations (Frankenberg- 
Garcia, 2011), or may simply not realize that their emerging texts could be made 
more idiomatic (Frankenberg-Garcia, 2014; Laufer, 2011). Moreover, even when 
writers realize they need help, looking up collocations while writing can be 
distracting and disruptive (Yoon, 2016). 


To address this challenge, we are developing a text editor that assists writers with 
academic English collocations (Frankenberg-Garcia et al., 2019a). ColloCaid 
provides writers with collocation suggestions as they write, helping them find 
idiomatic combinations of words and expand their collocational repertoire. 
ColloCaid can also be used to revise collocations in existing drafts. 


2. Lexicographic database 


The ColloCaid lexicographic database aims to address core collocations used 
across disciplines in general academic English. As detailed in Frankenberg-Garcia 
et al. (2019a), it draws on the noun, verb and adjective lemmas that occur in at least 
two of three well-known academic vocabulary lists: the Academic Keyword List 
(Paquot, 2010), the Academic Collocation List (Ackermann & Chen, 2013), and 
the Durrant (2016) subset of the Gardner and Davies (2014) Academic Vocabulary 
List. 


The original selection of lemmas has been revised to (1) disambiguate polysemy 
(e.g. figure as image, as number and as person); (2) include homographs used in 
academic contexts (e.g. aim was initially only listed as a noun, but its less frequent 
verbal lemma was added to avoid the impression that only the noun was idiomatic); 
(3) discard lemmas that are not collocationally productive (e.g. actual); and (4) 
add high-frequency interdisciplinary academic lemmas like paper and table, which 
slipped through initial selection thresholds (Rees et al., 2019). 


The database was populated with interdisciplinary collocates pertaining to the 
above lemmas extracted from corpora of expert academic English writing. As 
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detailed in Frankenberg-Garcia et al. (2019a), this was undertaken using Sketch 
Engine (Kilgarriff et al., 2014), which automatically summarizes the main 
collocations of a lemma in a corpus. Issues with the extraction have been dealt 
with using lexicographic judgment on a case by case basis. This included, for 
example, overruling the classification of regard as a verb, since its primary use in 
academic texts is preposition-like, in contexts such as decisions regarding safety, 
or in prepositional phrases like with regard to (Rees et al., 2019). 


The database was further populated with authentic examples of collocations in 
use, selected according to typicality, informativity, and intelligibility. Examples 
were also curated to address language production needs and maximize their 
potential for data-driven learning, as explained in Frankenberg-Garcia (2014). 
Figure | summarizes the lexical coverage of ColloCaid in its current 0.4 version 
(20 September 2019). 


Figure 1. ColloCaid 0.4 lexicographic database 
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(+22206 extra) 
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3. Text editor integration 


Academic writers from different disciplines have their own preferred operating 
systems and text editors. In our interdisciplinary research team, for example, papers 
initiated by the linguists are normally drafted in a Windows environment using 
Microsoft Word, whereas the computer scientists prefer to use Macs and LaTeX 
editors. For developing a prototype and testing it with different users, we opted 
for an online editor that can be accessed from a normal browser compatible with 
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multiple devices and operating systems, without the need to download additional 
software. TinyMCE (https://www.tiny.cloud/), a widely used open-source editor 
that looks like any regular editor was selected for this purpose (Figure 2: A). 


We adopted a dynamic, data-driven learning approach to the integration of the 
lexicographic data into the editor. It is data-driven because collocations suggestions 
are shown rather than explained. It is dynamic because collocations are displayed 
only when wanted, and in as much detail as desired, via progressive interactive 
menus (Figure 2: B-E). 


Figure 2. ColloCaid editor 


4. Initial user studies 


Development versions of ColloCaid have been tested during university writing 
workshops and seminars in Brazil, France, Poland, and Spain (Frankenberg-Garcia 
et al., 2019b). Participants (N=122) included novice and expert L2 English writers 
from a wide range of disciplines. Due to space restrictions, we are only able to 
present here the scores obtained on the Brooke (2013) System Usability Scale 
(SUS). The SUS is a standard for measuring the usability of systems (hardware, 
software, websites, etc.), with the advantage that its results can be compared on the 
same scale with hundreds of other systems. It comprises ten alternating positive 
and negative statements about system usability which users rate with a Likert-type 
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scale. As shown in Figure 3, the SUS scores obtained for ColloCaid are between 
good and excellent (and above the SUS average of around 70), despite known bugs 
and minor issues with the lexicographic database. 


Figure 3. Usability scores of ColloCaid v0.1 to v0.3 and interpretation of SUS 
values (right) according to Bangor, Kortum, and Miller (2009) 


90 Best Imaginable 
Excellent 


SUS average 


orst imaginable 


Poznan 0.1 Paris 0.1 Brazil 0.2 Brazil 0.3 Leén 0.3 


5. Conclusion and future work 


Previous studies on academic writing needs and dictionary use have led us to 
develop a text editor integrated with a large, lexical database of general academic 
English collocation suggestions, enriched with corpus examples of collocations in 
use. Our prototype, which draws on the principle of dynamic data-driven learning, 
has been well received by L2 users of academic English, scoring between good and 
excellent on the SUS. Future development of ColloCaid includes adjustments to the 
lexical database (i.e. expanding and proofreading current coverage), experimenting 
with new ways of visualizing collocations, and further user testing with think- 
aloud and diary studies. 
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